All Questions
3 questions
0votes
1answer
164views
Reward not improving for a custom environment using PPO
I've been trying to train an agent on a custom environment I implemented with gym where the goal is to resolve voltage violations in a power grid by adjusting the active power (loads) at each node. I ...
0votes
1answer
290views
Why is PPO not choosing a solution that is giving a higher cumulative reward?
I use PPO to train my fermenter (digital twin) to maximize enzyme (product) production. action: 1 or 0 ie. add substrate at a particular time or not based on cell and enzymes present in the tank ...
1vote
1answer
478views
Getting always the same action on an A2C from stable_baselines3
I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that ...